Generalized Gamma distribution (gengamma)#

The generalized gamma distribution (scipy.stats.gengamma) is a flexible continuous distribution on \((0,\infty)\). It is widely used for modeling positive data (lifetimes, waiting times, sizes) because it can interpolate between several classic families.

A key identity makes it easy to work with:

\[ Y \sim \mathrm{Gamma}(a, 1), \qquad X = \mathrm{loc} + \mathrm{scale}\,Y^{1/c}. \]

Equivalently, in the standard form (loc=0, scale=1) we have \(X^c \sim \mathrm{Gamma}(a,1)\).

What you’ll learn#

  • the PDF/CDF and the role of the incomplete gamma function

  • moment conditions and closed-form raw moments

  • how parameters \(a\), \(c\), and scale change the shape

  • a NumPy-only sampler (via Gamma sampling) + Monte Carlo validation

  • practical usage via scipy.stats.gengamma (pdf, cdf, rvs, fit)

import numpy as np

import plotly
import plotly.graph_objects as go
import os
import plotly.io as pio

import scipy
from scipy import special, stats
from scipy.stats import gengamma as gengamma_dist

# Plotly rendering (CKC convention)
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

# Reproducibility
rng = np.random.default_rng(42)

np.set_printoptions(precision=5, suppress=True)

# Record versions for reproducibility (useful when numerical details matter).
VERSIONS = {"numpy": np.__version__, "scipy": scipy.__version__, "plotly": plotly.__version__}

1) Title & Classification#

  • Name: gengamma (generalized gamma; SciPy: scipy.stats.gengamma)

  • Type: Continuous

  • Support (standard form): \(x \in (0,\infty)\)

  • Parameter space (standard form): shape parameters \(a>0\), \(c\ne 0\)

  • SciPy location/scale: loc \in \mathbb{R}, scale > 0 with $\(X = \mathrm{loc} + \mathrm{scale}\,Y, \qquad Y \sim \mathrm{GenGamma}(a,c).\)\( Support becomes \)x>\mathrm{loc}$.

Unless stated otherwise, we work with the standard form (loc=0, scale=1).

2) Intuition & Motivation#

What it models#

The generalized gamma is a shape-flexible model for positive data. Depending on parameters it can resemble a Gamma/Weibull-like right-skewed distribution, and for some negative values of \(c\) it can produce very heavy tails.

A very useful lens is a power transform: after adjusting for scale, the random variable \(X^c\) is Gamma-distributed. That means many properties can be derived by reducing the problem to the Gamma distribution.

Typical real-world use cases#

  • Reliability / survival analysis: time-to-failure, repair times, waiting times.

  • Hydrology / climate: rainfall amounts and other strictly-positive quantities.

  • Economics / operations: positive durations and sizes with flexible skew/tails.

  • Generative modeling: a flexible positive noise model or a component in mixture models.

Relations to other distributions (special cases)#

In the standard form (loc=0, scale=1) for \(c>0\):

  • \(c = 1\) gives the Gamma distribution: \(X \sim \mathrm{Gamma}(a,1)\).

  • \(a = 1\) gives the Weibull distribution: \(X \sim \mathrm{Weibull}(k=c,\lambda=1)\).

  • \(a = 1,\; c = 1\) gives the Exponential distribution (rate 1).

  • \(c = 2\) with \(a = \nu/2\) and scale = \sqrt{2} yields the Chi distribution with \(\nu\) degrees of freedom.

More generally, \(Y=(X/\mathrm{scale})^c \sim \mathrm{Gamma}(a,1)\), which is the workhorse relationship we will exploit.

3) Formal Definition#

PDF (standard form)#

For \(x>0\), \(a>0\), \(c\ne 0\), the generalized gamma PDF is

\[ f(x; a,c) = \frac{|c|}{\Gamma(a)}\,x^{ca-1}\,\exp(-x^c), \qquad x>0. \]

PDF with loc/scale#

SciPy uses the location/scale transform \(x = \mathrm{loc} + \mathrm{scale}\,y\) with scale>0. Writing \(y=(x-\mathrm{loc})/\mathrm{scale}\),

\[ f_X(x; a,c,\mathrm{loc},\mathrm{scale}) = \frac{1}{\mathrm{scale}}\, f\!\left( \frac{x-\mathrm{loc}}{\mathrm{scale}}; a,c \right), \qquad x>\mathrm{loc}. \]

CDF#

Let \(u = \left(\frac{x-\mathrm{loc}}{\mathrm{scale}}\right)^c\) for \(x>\mathrm{loc}\), and let

  • \(\gamma(a,u) = \int_0^u t^{a-1}e^{-t}\,dt\) be the lower incomplete gamma,

  • \(\Gamma(a,u) = \int_u^\infty t^{a-1}e^{-t}\,dt\) be the upper incomplete gamma,

  • \(P(a,u)=\gamma(a,u)/\Gamma(a)\) and \(Q(a,u)=\Gamma(a,u)/\Gamma(a)=1-P(a,u)\) be the regularized versions.

Then

\[\begin{split} F_X(x) = \begin{cases} 0, & x\le \mathrm{loc},\\ P(a, u), & x>\mathrm{loc},\; c>0,\\ Q(a, u), & x>\mathrm{loc},\; c<0. \end{cases} \end{split}\]

The sign dependence comes from the fact that \(x \mapsto x^c\) is increasing for \(c>0\) and decreasing for \(c<0\).

def _validate_params(a: float, c: float, scale: float) -> None:
    if not (a > 0):
        raise ValueError("Parameter 'a' must be > 0")
    if c == 0:
        raise ValueError("Parameter 'c' must be non-zero")
    if not (scale > 0):
        raise ValueError("Parameter 'scale' must be > 0")


def gengamma_logpdf(x: np.ndarray, a: float, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
    """Log-PDF of SciPy's generalized gamma parameterization.

    Standard form (loc=0, scale=1):
        f(x; a,c) = |c| x^{ca-1} exp(-x^c) / Gamma(a),   x>0

    Notes:
      - Works for any real c != 0.
      - Uses log space for numerical stability.
    """

    a = float(a)
    c = float(c)
    loc = float(loc)
    scale = float(scale)
    _validate_params(a, c, scale)

    x = np.asarray(x, dtype=float)
    y = (x - loc) / scale

    out = np.full_like(y, -np.inf, dtype=float)
    mask = y > 0

    if not np.any(mask):
        return out

    ym = y[mask]

    with np.errstate(divide="ignore", invalid="ignore", over="ignore"):
        out[mask] = (
            np.log(abs(c))
            - np.log(scale)
            - special.gammaln(a)
            + (c * a - 1.0) * np.log(ym)
            - np.power(ym, c)
        )

    return out


def gengamma_pdf(x: np.ndarray, a: float, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
    return np.exp(gengamma_logpdf(x, a, c, loc=loc, scale=scale))


def gengamma_cdf(x: np.ndarray, a: float, c: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
    """CDF expressed via regularized incomplete gamma functions.

    For u = ((x-loc)/scale)^c:
      - if c>0: F = P(a,u) = gammainc(a,u)
      - if c<0: F = Q(a,u) = gammaincc(a,u)
    """

    a = float(a)
    c = float(c)
    loc = float(loc)
    scale = float(scale)
    _validate_params(a, c, scale)

    x = np.asarray(x, dtype=float)
    y = (x - loc) / scale

    out = np.zeros_like(y, dtype=float)
    mask = y > 0

    if not np.any(mask):
        return out

    ym = y[mask]
    u = np.power(ym, c)

    if c > 0:
        out[mask] = special.gammainc(a, u)
    else:
        out[mask] = special.gammaincc(a, u)

    # Defensive clipping for tiny numerical drift.
    return np.clip(out, 0.0, 1.0)


def gengamma_raw_moment(k: float, a: float, c: float, scale: float = 1.0) -> float:
    """Raw moment E[X^k] for loc=0.

    Exists when a + k/c > 0.
    """

    a = float(a)
    c = float(c)
    scale = float(scale)
    _validate_params(a, c, scale)

    t = a + k / c
    if t <= 0:
        return float("nan")

    return float(np.exp(special.gammaln(t) - special.gammaln(a) + k * np.log(scale)))


def gengamma_mvsk(a: float, c: float, scale: float = 1.0) -> tuple[float, float, float, float]:
    """Mean/variance/skew/(excess)kurtosis for loc=0, if they exist."""

    m1 = gengamma_raw_moment(1, a, c, scale)
    m2 = gengamma_raw_moment(2, a, c, scale)
    m3 = gengamma_raw_moment(3, a, c, scale)
    m4 = gengamma_raw_moment(4, a, c, scale)

    mean = m1
    var = m2 - m1**2

    if not np.isfinite(var) or var <= 0:
        return mean, var, float("nan"), float("nan")

    mu3 = m3 - 3 * m2 * mean + 2 * mean**3
    mu4 = m4 - 4 * m3 * mean + 6 * m2 * mean**2 - 3 * mean**4

    skew = mu3 / var ** 1.5
    kurt_excess = mu4 / var**2 - 3.0

    return mean, var, skew, kurt_excess


def gengamma_entropy(a: float, c: float, scale: float = 1.0) -> float:
    """Differential entropy for loc=0, scale=scale."""

    a = float(a)
    c = float(c)
    scale = float(scale)
    _validate_params(a, c, scale)

    return float(
        np.log(scale)
        - np.log(abs(c))
        + special.gammaln(a)
        + a
        - (a - 1.0 / c) * special.digamma(a)
    )


def gengamma_rvs_numpy(
    a: float,
    c: float,
    loc: float = 0.0,
    scale: float = 1.0,
    size: int | tuple[int, ...] = 1,
    rng: np.random.Generator | None = None,
) -> np.ndarray:
    """NumPy-only sampler using the Gamma transform.

    Algorithm:
      1) draw Y ~ Gamma(a, 1) using NumPy
      2) return X = loc + scale * Y^(1/c)
    """

    a = float(a)
    c = float(c)
    loc = float(loc)
    scale = float(scale)
    _validate_params(a, c, scale)

    rng = np.random.default_rng() if rng is None else rng

    y = rng.gamma(shape=a, scale=1.0, size=size)
    return loc + scale * np.power(y, 1.0 / c)
# Sanity check: our formulas match SciPy.

a, c, loc, scale = 2.3, 1.7, 0.0, 0.8
x = np.logspace(-3, 2, 9)

pdf_ours = gengamma_pdf(x, a, c, loc=loc, scale=scale)
cdf_ours = gengamma_cdf(x, a, c, loc=loc, scale=scale)

pdf_scipy = gengamma_dist.pdf(x, a, c, loc=loc, scale=scale)
cdf_scipy = gengamma_dist.cdf(x, a, c, loc=loc, scale=scale)

print("max |pdf diff|:", float(np.max(np.abs(pdf_ours - pdf_scipy))))
print("max |cdf diff|:", float(np.max(np.abs(cdf_ours - cdf_scipy))))
print("allclose?", np.allclose(pdf_ours, pdf_scipy) and np.allclose(cdf_ours, cdf_scipy))
max |pdf diff|: 1.1102230246251565e-16
max |cdf diff|: 0.0
allclose? True

4) Moments & Properties#

Raw moments#

For loc=0 and scale=\beta, if \(a + k/c > 0\),

\[ \mathbb{E}[X^k] = \beta^k \frac{\Gamma(a + k/c)}{\Gamma(a)}. \]

This follows immediately from \(Y=(X/\beta)^c \sim \mathrm{Gamma}(a,1)\).

Mean / variance#

When they exist (i.e. \(a + 1/c>0\) and \(a + 2/c>0\)):

\[ \mu = \mathbb{E}[X] = \beta \frac{\Gamma(a + 1/c)}{\Gamma(a)}, \]
\[ \mathrm{Var}(X) = \beta^2\left[\frac{\Gamma(a + 2/c)}{\Gamma(a)} - \left(\frac{\Gamma(a + 1/c)}{\Gamma(a)}\right)^2\right]. \]

Skewness and (excess) kurtosis can be expressed using \(m_k=\mathbb{E}[X^k]\) for \(k\le 4\).

Existence of moments#

Because \(m_k\) requires \(a + k/c>0\), moment existence depends on the sign of \(c\):

  • for \(c>0\), all positive raw moments exist (since \(a>0\) and \(k/c>0\))

  • for \(c<0\), sufficiently high moments may diverge (heavy tail)

MGF / characteristic function#

There is no simple closed form for \(M_X(t)=\mathbb{E}[e^{tX}]\) in general. A useful representation is the moment series

\[ M_X(t) = \sum_{n=0}^{\infty} \frac{t^n}{n!}\,\mathbb{E}[X^n] = \sum_{n=0}^{\infty} \frac{t^n}{n!}\,\beta^n\frac{\Gamma(a+n/c)}{\Gamma(a)}, \]

when the series converges.

Rule of thumb for \(t>0\) (tail comparison):

  • \(c>1\): MGF exists for all real \(t\)

  • \(c=1\): Gamma case; MGF exists for \(t < 1/\beta\)

  • \(0<c<1\) or \(c<0\): MGF diverges for any \(t>0\) (tails are too heavy)

The characteristic function \(\varphi_X(t)=\mathbb{E}[e^{itX}]\) always exists.

Entropy#

For loc=0 and scale=\beta the differential entropy is

\[ H(X) = \log \beta - \log|c| + \log \Gamma(a) + a - \left(a-\frac{1}{c}\right)\psi(a), \]

where \(\psi\) is the digamma function.

# Compare closed-form moments/entropy against SciPy.

a, c, scale = 4.42, -3.12, 2.0

mean, var, skew, kurt_excess = gengamma_mvsk(a, c, scale=scale)
mean_s, var_s, skew_s, kurt_excess_s = gengamma_dist.stats(a, c, scale=scale, moments="mvsk")

entropy = gengamma_entropy(a, c, scale=scale)
entropy_s = gengamma_dist.entropy(a, c, scale=scale)

print("mean:", mean, "(SciPy:", float(mean_s), ")")
print("var:", var, "(SciPy:", float(var_s), ")")
print("skew:", skew, "(SciPy:", float(skew_s), ")")
print("excess kurtosis:", kurt_excess, "(SciPy:", float(kurt_excess_s), ")")
print("entropy:", entropy, "(SciPy:", float(entropy_s), ")")
mean: 1.307135070065489 (SciPy: 1.3071350700654891 )
var: 0.04921521947520513 (SciPy: 0.049215219475204686 )
skew: 1.1262322238186198 (SciPy: 1.1262322238187161 )
excess kurtosis: 2.6645270107947754 (SciPy: 2.6645270107930443 )
entropy: -0.1699451239020231 (SciPy: -0.16994512390202365 )
# Empirical characteristic function via Monte Carlo (always exists).

a0, c0, scale0 = 2.0, 0.7, 1.0
samples = gengamma_rvs_numpy(a0, c0, scale=scale0, size=80_000, rng=rng)

t_grid = np.linspace(0, 12, 250)
phi_hat = np.array([np.mean(np.exp(1j * t * samples)) for t in t_grid])

fig = go.Figure()
fig.add_trace(go.Scatter(x=t_grid, y=np.real(phi_hat), mode="lines", name="Re φ̂(t)"))
fig.add_trace(go.Scatter(x=t_grid, y=np.imag(phi_hat), mode="lines", name="Im φ̂(t)"))

fig.update_layout(
    title="Empirical characteristic function of gengamma (Monte Carlo)",
    xaxis_title="t",
    yaxis_title="value",
    width=900,
    height=420,
)
fig

5) Parameter Interpretation#

SciPy’s gengamma uses two shape parameters a and c, plus the standard loc and scale.

The most useful mental model#

Let

\[ Y = \left(\frac{X-\mathrm{loc}}{\mathrm{scale}}\right)^c. \]

Then \(Y \sim \mathrm{Gamma}(a,1)\). So you can interpret parameters through the Gamma distribution plus a power transform.

What the parameters do#

  • scale (\(\beta\)): multiplies \(X\) and sets the overall units / typical magnitude.

  • loc: shifts the distribution; the support becomes \(x>\mathrm{loc}\).

  • a: the Gamma shape of \(Y\); larger a typically moves mass away from 0 and reduces relative variability.

  • c: controls the power transform.

    • for \(c>1\) the tail is thinner (decays faster than exponential in \(x\))

    • for \(0<c<1\) the tail is heavier (stretched-exponential)

    • for \(c<0\) the tail becomes polynomial-heavy, and high-order moments may not exist

Below we visualize how changing a, c, and scale changes the PDF.

x = np.linspace(1e-4, 8, 800)

# 1) Vary a with c fixed (Gamma special case: c=1)
c_fixed = 1.0
fig1 = go.Figure()
for a in [0.7, 1.0, 2.0, 5.0]:
    fig1.add_trace(go.Scatter(x=x, y=gengamma_pdf(x, a, c_fixed), mode="lines", name=f"a={a}"))

fig1.update_layout(
    title="Effect of a (fix c=1 → Gamma family)",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig1
# 2) Vary c with a fixed

a_fixed = 2.0
fig2 = go.Figure()
for c in [0.5, 1.0, 2.0]:
    fig2.add_trace(go.Scatter(x=x, y=gengamma_pdf(x, a_fixed, c), mode="lines", name=f"c={c}"))

fig2.update_layout(
    title="Effect of c (fix a=2)",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig2
# 3) Scale stretches the x-axis

a, c = 2.0, 1.5
fig3 = go.Figure()
for s in [0.5, 1.0, 2.0]:
    fig3.add_trace(go.Scatter(x=x, y=gengamma_pdf(x, a, c, scale=s), mode="lines", name=f"scale={s}"))

fig3.update_layout(
    title="Effect of scale (fix a=2, c=1.5)",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig3

6) Derivations#

We focus on loc=0 for clarity. Adding loc is a simple shift.

6.1 PDF via a Gamma change of variables#

Let \(Y \sim \mathrm{Gamma}(a,1)\) with PDF

\[ f_Y(y) = \frac{1}{\Gamma(a)} y^{a-1} e^{-y}, \qquad y>0. \]

Define the transformation

\[ X = \beta\,Y^{1/c} \quad (\beta>0), \qquad\text{so}\qquad Y = \left(\frac{X}{\beta}\right)^c. \]

The Jacobian is

\[ \left|\frac{dY}{dX}\right| = \left|\frac{c}{\beta}\left(\frac{X}{\beta}\right)^{c-1}\right|. \]

Therefore

\[\begin{split} \begin{aligned} f_X(x) &= f_Y\!\left(\left(\frac{x}{\beta}\right)^c\right)\,\left|\frac{d}{dx}\left(\frac{x}{\beta}\right)^c\right|\\ &= \frac{1}{\Gamma(a)}\left(\frac{x}{\beta}\right)^{c(a-1)}\exp\!\left[-\left(\frac{x}{\beta}\right)^c\right]\,\left|\frac{c}{\beta}\left(\frac{x}{\beta}\right)^{c-1}\right|\\ &= \frac{|c|}{\beta\,\Gamma(a)}\left(\frac{x}{\beta}\right)^{ca-1}\exp\!\left[-\left(\frac{x}{\beta}\right)^c\right], \qquad x>0. \end{aligned} \end{split}\]

Setting \(\beta=1\) recovers the standardized PDF from SciPy’s docstring.

6.2 Expectation and variance#

Using the same transform,

\[ \mathbb{E}[X^k] = \beta^k\,\mathbb{E}\left[Y^{k/c}\right] = \beta^k\,\frac{\Gamma(a+k/c)}{\Gamma(a)}, \]

as long as \(a+k/c>0\). Mean and variance are obtained by plugging in \(k=1\) and \(k=2\).

6.3 Likelihood (i.i.d. sample)#

For observations \(x_1,\dots,x_n>0\) (with loc=0), the likelihood is

\[ L(a,c,\beta; x_{1:n}) = \prod_{i=1}^n \frac{|c|}{\beta\,\Gamma(a)}\left(\frac{x_i}{\beta}\right)^{ca-1}\exp\!\left[-\left(\frac{x_i}{\beta}\right)^c\right]. \]

The log-likelihood is

\[ \ell(a,c,\beta) = n\log|c| - n\log\Gamma(a) + (ca-1)\sum_{i=1}^n\log x_i - nca\log\beta - \sum_{i=1}^n\left(\frac{x_i}{\beta}\right)^c. \]

Closed-form MLEs are not available in general; numerical optimization is typically used (as in scipy.stats.gengamma.fit).

7) Sampling & Simulation#

The Gamma transform gives an immediate sampling algorithm.

Algorithm (NumPy-only) for loc=0:

  1. Sample \(Y \sim \mathrm{Gamma}(a,1)\) (NumPy can do this directly).

  2. Return \(X = \mathrm{scale}\,Y^{1/c}\).

For general loc, shift: \(X = \mathrm{loc} + \mathrm{scale}\,Y^{1/c}\).

Below we validate sampling by checking:

  • sample mean/variance vs the closed forms

  • that \((X/\mathrm{scale})^c\) looks Gamma-distributed

a0, c0, scale0 = 3.0, 1.2, 2.0

samples = gengamma_rvs_numpy(a0, c0, scale=scale0, size=200_000, rng=rng)

mean_true, var_true, *_ = gengamma_mvsk(a0, c0, scale=scale0)
mean_hat = float(np.mean(samples))
var_hat = float(np.var(samples))

print("mean (MC)  :", mean_hat)
print("mean (true):", mean_true)
print("var  (MC)  :", var_hat)
print("var  (true):", var_true)
mean (MC)  : 4.887246702960949
mean (true): 4.886184597056009
var  (MC)  : 5.527400457004914
var  (true): 5.548009631523051
# Validate the transform: Y = (X/scale)^c should be Gamma(a,1)

y = np.power(samples / scale0, c0)

grid = np.linspace(0, np.quantile(y, 0.995), 700)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=y,
        nbinsx=70,
        histnorm="probability density",
        name="Y=(X/scale)^c (Monte Carlo)",
        opacity=0.55,
    )
)
fig.add_trace(
    go.Scatter(
        x=grid,
        y=stats.gamma(a0, scale=1.0).pdf(grid),
        mode="lines",
        name=f"Gamma(a={a0}, scale=1)",
        line=dict(width=3),
    )
)

fig.update_layout(
    title="Power transform check: (X/scale)^c ~ Gamma(a,1)",
    xaxis_title="y",
    yaxis_title="density",
    width=900,
    height=420,
)
fig

8) Visualization#

We’ll compare:

  • the theoretical PDF and CDF

  • Monte Carlo samples (NumPy-only sampler)

  • SciPy’s implementation

# PDF + histogram (Monte Carlo)

x_grid = np.linspace(1e-4, np.quantile(samples, 0.995), 900)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=samples,
        nbinsx=70,
        histnorm="probability density",
        name="Monte Carlo (NumPy-only)",
        opacity=0.55,
    )
)
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=gengamma_pdf(x_grid, a0, c0, scale=scale0),
        mode="lines",
        name="PDF (closed form)",
        line=dict(width=3),
    )
)
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=gengamma_dist.pdf(x_grid, a0, c0, scale=scale0),
        mode="lines",
        name="PDF (SciPy)",
        line=dict(width=2, dash="dot"),
    )
)

fig.update_layout(
    title=f"gengamma(a={a0}, c={c0}, scale={scale0}): histogram vs PDF",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig
# CDF: theoretical vs empirical

emp_x = np.sort(samples)
emp_cdf = np.arange(1, emp_x.size + 1) / emp_x.size

x_grid = np.linspace(0, np.quantile(samples, 0.995), 700)

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_grid, y=gengamma_cdf(x_grid, a0, c0, scale=scale0), mode="lines", name="CDF (closed form)"))
fig.add_trace(
    go.Scatter(
        x=emp_x[::400],
        y=emp_cdf[::400],
        mode="markers",
        name="Empirical CDF (subsampled)",
        marker=dict(size=4, opacity=0.6),
    )
)

fig.update_layout(
    title=f"gengamma(a={a0}, c={c0}, scale={scale0}): theoretical vs empirical CDF",
    xaxis_title="x",
    yaxis_title="CDF",
    width=900,
    height=420,
)
fig

9) SciPy Integration (scipy.stats.gengamma)#

scipy.stats.gengamma provides the usual distribution API:

  • gengamma.pdf(x, a, c, loc=0, scale=1)

  • gengamma.cdf(x, a, c, loc=0, scale=1)

  • gengamma.rvs(a, c, loc=0, scale=1, size=..., random_state=...)

  • gengamma.fit(data, ...) (MLE)

A common pattern is to freeze the distribution: rv = gengamma(a, c, loc=..., scale=...), then call rv.pdf, rv.cdf, rv.rvs, etc.

# Basic SciPy usage + fitting

a_true, c_true, loc_true, scale_true = 3.0, 1.2, 0.0, 2.0

rv_true = gengamma_dist(a_true, c_true, loc=loc_true, scale=scale_true)
data = rv_true.rvs(size=3000, random_state=rng)

# Fit with loc fixed to 0 for stability/interpretability
# (If loc is free, optimizers can sometimes trade off loc/scale.)
a_hat, c_hat, loc_hat, scale_hat = gengamma_dist.fit(data, floc=0.0)

print("true params: ", (a_true, c_true, loc_true, scale_true))
print("fitted params:", (float(a_hat), float(c_hat), float(loc_hat), float(scale_hat)))

rv_fit = gengamma_dist(a_hat, c_hat, loc=loc_hat, scale=scale_hat)

x_grid = np.linspace(1e-4, np.quantile(data, 0.995), 800)

fig = go.Figure()
fig.add_trace(go.Histogram(x=data, nbinsx=70, histnorm="probability density", name="data", opacity=0.55))
fig.add_trace(go.Scatter(x=x_grid, y=rv_true.pdf(x_grid), mode="lines", name="true pdf", line=dict(width=3)))
fig.add_trace(go.Scatter(x=x_grid, y=rv_fit.pdf(x_grid), mode="lines", name="fitted pdf", line=dict(width=2, dash="dot")))

fig.update_layout(
    title="SciPy fit: fitted PDF vs true PDF",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig
true params:  (3.0, 1.2, 0.0, 2.0)
fitted params: (2.8464391259204764, 1.2301080320124074, 0.0, 2.133363346645424)

10) Statistical Use Cases#

Hypothesis testing (goodness-of-fit)#

If parameters are specified (not estimated from the same data), you can test whether data plausibly comes from a generalized gamma using goodness-of-fit tests such as Kolmogorov–Smirnov (KS).

Caveat: if you estimate parameters from the data and then run KS on the same data, the usual p-values are no longer exact (you need a corrected procedure or bootstrap).

Bayesian modeling#

The likelihood is available in closed form, so you can do Bayesian inference with generic methods. There is no simple conjugate prior for \((a,c,\mathrm{scale})\) in general, but you can still:

  • infer a subset of parameters (e.g., scale) with grid methods

  • use MCMC / variational inference frameworks for full Bayesian inference

Generative modeling#

gengamma is useful as a flexible positive noise model, or as a component in a mixture model when you want strictly-positive data with tunable skew/tails.

Because high-order moments may not exist for some parameter settings (especially \(c<0\)), prefer robust summaries (medians/quantiles) when exploring or evaluating fits.

# Hypothesis testing example: KS test when parameters are known.

from scipy.stats import kstest

# Generate data from the known model
rv = gengamma_dist(2.5, 1.3, scale=1.7)
x = rv.rvs(size=800, random_state=rng)

# One-sample KS test against the known CDF
stat, p_value = kstest(x, rv.cdf)
print("KS statistic:", float(stat))
print("p-value:", float(p_value))
KS statistic: 0.06359559061112985
p-value: 0.0029546837073747357
# Bayesian modeling example: grid posterior for scale with (a,c) assumed known.

# Data generated from a known model
true_a, true_c, true_scale = 3.0, 1.2, 2.0
x = gengamma_dist(true_a, true_c, scale=true_scale).rvs(size=500, random_state=rng)

# Grid over scale (log-spaced)
scale_grid = np.exp(np.linspace(np.log(0.3), np.log(5.0), 400))

loglik = np.array([np.sum(gengamma_logpdf(x, true_a, true_c, scale=s)) for s in scale_grid])

# Log-uniform prior p(scale) ∝ 1/scale  -> log prior = -log(scale)
logpost_unnorm = loglik - np.log(scale_grid)

# Normalize on the grid
logpost = logpost_unnorm - np.max(logpost_unnorm)
post = np.exp(logpost)
post /= np.trapz(post, scale_grid)

scale_map = float(scale_grid[np.argmax(post)])

# 90% credible interval from the grid
cdf_post = np.cumsum(post) * np.diff(scale_grid, prepend=scale_grid[0])
lo = float(np.interp(0.05, cdf_post, scale_grid))
hi = float(np.interp(0.95, cdf_post, scale_grid))

print("true scale:", true_scale)
print("MAP scale :", scale_map)
print("90% CI    :", (lo, hi))

fig = go.Figure()
fig.add_trace(go.Scatter(x=scale_grid, y=post, mode="lines", name="posterior"))
fig.add_vline(x=true_scale, line_dash="dash", line_width=2, annotation_text="true")
fig.add_vline(x=scale_map, line_dash="dot", line_width=2, annotation_text="MAP")

fig.update_layout(
    title="Posterior over scale (a,c known)",
    xaxis_title="scale",
    yaxis_title="density",
    width=900,
    height=420,
)
fig
true scale: 2.0
MAP scale : 1.8631642601470884
90% CI    : (1.790712999885101, 1.9140843629849709)
# Generative modeling example: fit gengamma and generate synthetic samples.

# Simulate a positive dataset
rv_true = gengamma_dist(3.2, 0.9, scale=1.5)
data = rv_true.rvs(size=2000, random_state=rng)

# Fit (fix loc to 0)
a_hat, c_hat, loc_hat, scale_hat = gengamma_dist.fit(data, floc=0.0)
rv_fit = gengamma_dist(a_hat, c_hat, loc=loc_hat, scale=scale_hat)

sim = rv_fit.rvs(size=data.size, random_state=rng)

qs = np.array([0.1, 0.5, 0.9])
print("quantiles:")
print("  data:", np.quantile(data, qs))
print("  sim :", np.quantile(sim, qs))

x_grid = np.linspace(1e-4, np.quantile(data, 0.995), 800)

fig = go.Figure()
fig.add_trace(go.Histogram(x=data, nbinsx=70, histnorm="probability density", name="data", opacity=0.5))
fig.add_trace(go.Histogram(x=sim, nbinsx=70, histnorm="probability density", name="sim (fitted)", opacity=0.35))
fig.add_trace(go.Scatter(x=x_grid, y=rv_fit.pdf(x_grid), mode="lines", name="fitted pdf", line=dict(width=3)))

fig.update_layout(
    title="Generative check: fitted gengamma samples vs data",
    xaxis_title="x",
    yaxis_title="density",
    width=900,
    height=420,
)
fig
quantiles:
  data: [ 1.82517  4.77339 10.05547]
  sim : [1.93852 4.65291 9.56727]

11) Pitfalls#

  • Invalid parameters: \(a\le 0\), scale \le 0, or \(c=0\) are not allowed; the support is \(x>\mathrm{loc}\).

  • Moment existence (especially for \(c<0\)): raw moment \(\mathbb{E}[X^k]\) exists only when \(a + k/c > 0\).

  • Numerical issues for extreme values:

    • prefer logpdf/logcdf when multiplying many densities or working in the far tail

    • for large \(|c|\) or large \(x\), computing \(x^c\) can overflow/underflow; use log-space when possible

  • Fitting sensitivity / identifiability: two shape parameters + scale can lead to flat likelihood regions; consider fixing loc, using good initial values, and validating with QQ/CDF plots.

  • Interpretation: loc shifts the support; if your quantity is inherently nonnegative, fixing loc=0 often yields a more interpretable fit.

12) Summary#

  • gengamma is a continuous distribution on \((0,\infty)\) (or \(x>\mathrm{loc}\) with location/scale).

  • Its PDF is \(f(x;a,c)=|c|\,x^{ca-1}e^{-x^c}/\Gamma(a)\) (standard form).

  • The power transform \((X/\mathrm{scale})^c\) is Gamma-distributed, which yields closed-form raw moments and a simple sampling algorithm.

  • Special cases include Gamma (\(c=1\)), Weibull (\(a=1, c>0\)), Exponential (\(a=1,c=1\)), and Chi (with appropriate parameters).

  • Some parameter settings (notably \(c<0\)) can make higher moments diverge; use robust summaries and numerical care.

References#

  • E. W. Stacy (1962). “A Generalization of the Gamma Distribution.” Annals of Mathematical Statistics.

  • Johnson, Kotz, Balakrishnan. Continuous Univariate Distributions, Wiley.

  • SciPy documentation: scipy.stats.gengamma.